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Mathematical equations are fundamental to modeling biological networks, but as networks get 
large and revisions frequent, it becomes difficult to manage equations directly or to combine 
previously developed models. Multiple simultaneous efforts to create graphical standards, rule- 
based languages, and integrated software workbenches aim to simplify biological modeling but 
none fully meets the need for transparent, extensible, and reusable models. In this paper we 
describe PySB, an approach in which models are not only created using programs, they are 
programs. PySB draws on programmatic modeling concepts from little b and ProMot, the rule-based 
languages BioNetGen and Kappa and the growing library of Python numerical tools. Central to PySB 
is a library of macros encoding familiar biochemical actions such as binding, catalysis, and 
polymerization, making it possible to use a high-level, action-oriented vocabulary to construct 
detailed models. As Python programs, PySB models leverage tools and practices from the open- 
source software community, substantially advancing our ability to distribute and manage the work 
of testing biochemical hypotheses. We illustrate these ideas using new and previously published 
models of apoptosis. 
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Introduction 

Mechanistic studies that couple experimentation and compu- 
tation typically rely on models optimized for specific questions 
and biological settings. Such 'fit-to-purpose' models can 
effectively describe and elucidate complex biological pro- 
cesses, but given the available data, they are usually restricted 
in scope, encompassing only a subset of cellular biochemistry 
(Batchelor et al, 2008; Xu et al, 2010; Huber et al, 2011 ; Schleich 
et al, 2012). Even in disciplines in which modeling is more 
mature, all-encompassing models are rare. However, it is 
common for fit-to-purpose models to require changes invol- 
ving the addition, modification or elimination of species and 
reactions based on new discoveries. Often a family of models is 
needed (Albeck et al, 2008b; Muzzey et al, 2009; Rehm et al, 
2009; Xu et al, 2010; Kleiman et al, 2011), each of which 
represents a competing molecular hypothesis derived from the 
literature, a different way of encoding ambiguous 'word 
models' drawn from molecular biology, or postulated differ- 
ences in a network from one cell type to the next (Gnad et al, 
2012). One of the promises of systems biology is that 
collaborative and iterative construction and testing of models 
can improve hypotheses by subjecting them to an ongoing 
process of testing and improvement. However, the current 



proliferation of independently derived fit-to-purpose models 
frustrates this goal (Krakauer et al, 2011). We require 'second 
generation' tools that leverage existing approaches to biologi- 
cal model construction and documentation while adding new 
means for modifying, comparing and sharing models in a 
transparent manner. 

Dynamic biological systems are generally modeled using 
stochastic or deterministic rate equations. The latter can be 
described using networks of ordinary differential equations 
(ODEs) that precisely represent mass action kinetics. However, 
when a network model has many species and variables, 
equations become a poor tool for model development and 
understanding. Even familiar biochemical processes are 
remarkably complex at the level of equations: for example, 
fully accounting for the binding and posttranslational mod- 
ifications underlying activation of growth factor receptors can 
require thousands of equations that are tedious to generate, 
hard to error-check and difficult to understand (Hlavacek et al, 
2006). The need for frequent updates is also a challenge 
because even a simple modification of a biochemical cascade 
can require dozens of small changes in the corresponding 
equations. When operating at the level of equations, it is also 
difficult to reuse the work of others directly. For example, in the 
field of apoptosis, Howells et al (2010) described a conceptual 
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extension of a previously published model of Bcl-2 proteins 
(which control mitochondrial outer-membrane permeabiliza- 
tion (MOMP), Box 1) (Chen et al, 2007a) by adding the BH3- 
only Bcl-2 family member Bad. Interactions among the core set 
of Bcl-2 proteins were identical between the two models, but 
Howells et al (2010) rewrote the original set of ODEs simply to 
add a few new reactions. Manually rebuilding earlier models is 
not only time-consuming but also error-prone: as described in 
detail below, the practice has introduced errors and unin- 
tended changes in another pair of related apoptosis models. 
Moreover, the tendency to make numerous trivial changes in 
duplicated elements (e.g., by renaming species) makes it 
difficult to focus on key differences, frustrating later attempts 
at model comparison (Mallavarapu et al, 2008) . 

Several projects are underway to address the problems of 
model proliferation and complexity using formalisms that aim 
for abstraction, transparency and reusability. Chief among 
these is rule-based modeling (Hlavacek et al, 2006; Bachman 
and Sorger, 2011) in which models are created using 
specialized languages such as BioNetGen language (BNGL) 
(Faeder et al, 2009) or Kappa (Danos et al, 2007b). These 
languages describe local interaction rules between specific 
domains or 'sites' on molecules (e.g., enzymes and their 
substrates) and are easier to understand and reuse than 



Box 1 TRAIL-mediated apoptosis and the Bcl-2 protein family 



equations (Danos, 2007). Rule-based approaches enable 
modeling of otherwise intractably complex systems in which 
posttranslational modification and the formation of multi- 
protein signaling complexes give rise to large numbers of 
distinct species (Blinov et al, 2006; Sneddon et al, 2010; Deeds 
et al, 2012). Rules can also be subjected to formal analysis 
(Danos et al, 2008; Feret et al, 2009) and used to generate both 
deterministic and agent-based simulations (Danos et al, 2007a; 
Sneddon et al, 2010). 

Although powerful, rule-based languages such as BNGL and 
Kappa do not exploit 'higher-order' patterns in biochemical 
systems such as multi-step catalysis, scaffold assembly, 
polymerization, receptor internalization, etc. These patterns 
often recur several times in a single model and also are 
frequently encountered in models of different molecular 
networks. Currently, it is necessary to regenerate the rule sets 
for biochemical patterns each time the patterns arise. The 
creation of models having related but variant topologies 
presents an important special case of a higher-order pattern, 
particularly when a core process remains the same across all 
models and modifications focus on specific reactions. In both 
rule- and ODE-based models, it is necessary to implement the 
change for each model in the set, a laborious process when 
the number of models is large. Modeling tools that leverage 




TRAIL is a prototypical pro-death ligand that binds transmembrane DR4 and DR5 receptors and leads to formation of the intracellular, multi-component death- 
inducing signaling complex (DISC). Autocatalytic processing of initiator procaspases-8 and -1 0 at the DISC allows the enzymes to cleave procaspase-3 but caspase- 
3 activity is held in check by XIAP, an E3 ubiquitin ligase that blocks the caspase-3 active site and targets the enzyme for ubiquitin-mediated degradation. In most cell 
types, activation of caspase-3 and consequent cell killing requires MOMP. MOMP allows translocation of cytochrome cand Smac into the cytosol where Smac binds 
and inactivates XIAP and cytochrome c activates pro-caspase-9, two reactions that generate cleaved, active caspase-3. 

MOMP is regulated by a family of ~20 Bcl-2 proteins (Youle and Strasser, 2008) having three functions: pro-apoptotic effectors Bax and Bak assemble into 
pores, anti-apoptotic Bcl-2, Bcl-XL, Mcl-1 , Bcl-W, and A1 proteins block pore formation, and the 'BH3-only proteins' such as Bid, Bim, and Puma activate the effectors 
and inhibit the anti-apoptotics. Bid, the most important BH3-only protein for extrinsic cell death, is a direct substrate of caspases 8/10 and its active form (tBid) binds 
to and activates Bax and Bak via a recently elucidated structural transition (Kim et al, 2009; Gavathiotis etal, 2010). The overall pathway can be roughly divided into a 
'receptor to Bid' module (yellow in the figure), a 'pore to PARP' module (blue), and a MOMP module (orange). 

Structural and cellular studies of Bcl-2 proteins are consistent with a variety of different mechanisms for MOMP. A direct model posits that effectors form pores only 
when activated by proteins such as tBid (Letai etal, 2002; Kim etal, 2006; Ren etal, 2010). The indirect model proposes that Bax and Bak are constitutively able to 
form pores but are held in check by anti-apoptotic Bcl-2 proteins (Willis etal, 2007). Recent studies support a combination of both direct and indirect mechanisms 
(Merino et al, 2009; Leber et al, 2010; Llambi et al, 2011). The 'embedded together' model emphasizes the active role of membranes in determining the 
conformational states and activity of Bcl-2 proteins and that the anti-apoptotic Bcl-2 proteins possess all of the same functional interactions as the effectors except 
pore formation and therefore function as dominant-negatives (Billen etal, 2008; Leber etal, 2010). Controversy about MOMP mechanisms reflects the large number 
of Bcl-2 proteins involved, the role of protein compartmentalization and localization in activity, the diversity of apoptosis inducers, and the fact that different cell types 
express different Bcl-2 proteins at very different levels. Detailed mechanistic models of MOMP are nonetheless important for rationalizing the selectivity of anti-Bcl-2/ 
Bcl-XL drugs, such as ABT-263, understanding the oncogenic potential of proteins, such as Mcl-1 and Bcl-2, and elucidating the precise mechanisms of action of pro- 
apoptotic chemotherapeutics. 
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approaches from software engineering are one way to increase 
reusability and reduce duplication (Mallavarapu et al, 2008; 
Pedersen and Plotkin, 2008; Danos et al, 2009; Mirschel et al, 
2009; Gnad et al, 2012). In particular, LISP frameworks such as 
little b (Mallavarapu et al, 2008) and ProMot (Mirschel et al, 
2009) have demonstrated the value of programmatic 
approaches. However, ProMot does not use rules, limiting its 
effectiveness for combinatorially complex systems; little b, 
while implementing rules internally, does not interoperate with 
tools and languages from the broader rule-based modeling 
community and is no longer in development (the similarities 
and differences between the little b and ProMot approaches 
have been described previously (Mallavarapu et al, 2008)). 
Combining the strengths of rule-based and programmatic 
approaches to modeling is a key goal of the work described here. 

A benefit of modeling biological systems using contemporary 
approaches from computer science and open-source software 
engineering is the ready availability of tools and best practices 
for managing and testing complex code. Good software 
engineering practice promotes abstraction, composition and 
modularity (Mallavarapu et al, 2008; Mirschel et al, 2009). 
Through abstraction, the core features of a concept or process 
are separated from the particulars: for example, a pattern of 
biochemical reactions (e.g., phosphorylation-dephosphorylation 
of a substrate) is described once in a generic form as a 
subroutine and then instantiated for specific models simply by 
specifying the arguments (e.g., species such as Raf, PP2A, and 
MEK). In programming, abstraction is achieved through the 
use of parameterizable functions or macros that are written 
once and then invoked as needed. Functions can be built up 
from other functions, a process known as composition. 
Abstraction and composition can occur at all levels of 
complexity: just as complex functions can be built from simple 
functions, large programs can be built up from smaller 
subsystems that are documented and tested individually. 
When these subsystems have well-defined input-output 
interfaces, they can be used as libraries that make it possible 
to write new programs using a simple vocabulary of well- 
tested concepts (e.g., a library of biochemical actions or core 
pathways such as the MAPK cascade) (Pedersen and Plotkin, 
2008) . The decomposition of complex biological models in this 
fashion facilitates extensibility and transparency, because 
well-developed mechanisms can be reused and changes can 
be localized to the subsystem that needs revision. 

Contemporary software engineering has much to teach us 
about the difficult task of developing and documenting models 
in a distributed setting. Software engineers 'publish' their 
findings using robust programming tools that support code 
annotation, documentation, and verification, all significant 
challenges in biological modeling (Hlavacek, 2009). The 
open-source software community also provides a valuable 
socio -cultural framework for managing large, collaborative 
projects in the public domain. Version control tools such as Git 
and Subversion, along with 'social coding' websites such as 
GitHub, have facilitated the collaborative development of 
software as complex as the kernel of the Linux operating 
system (http://github.com). It would be highly desirable to 
exploit such social and technical innovation in solving the 
problems of incremental model development and reuse in 
biology. 
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In this paper, we describe PySB, an open-source program- 
ming framework written in Python that allows concepts and 
methodologies from contemporary software engineering to be 
applied to the construction of transparent, extensible and 
reusable biological models (http://python.org; Oliphant, 
2007). A critical feature of modeling with PySB is that models 
are Python programs, and tools for documentation, testing, 
and version control (e.g., Git) can be used to help manage 
model development. Strictly speaking, a PySB 'model' is a 
Python program, that, when executed, produces another 
program (the underlying reaction rules) that can be analyzed 
or used to create equations for simulation. The PySB frame- 
work provides a high-level action-oriented vocabulary con- 
gruent with our intuitive understanding of biochemistry 
(A phosphorylates B, C translocates to the nucleus, etc.). 
PySB is closely integrated with Python numerical tools for 
simulation and parameter estimation and graphical tools that 
enable plotting of model trajectories and topologies. We 
demonstrate the use of PySB to re-instantiate 12 previously 
published ODE-based models of the Bcl-2 family proteins that 
regulate apoptosis (Chen et al, 2007a; 2007b; Cui et al, 2008; 
Albeck et al, 2008b; Howells et al, 2010). We show how PySB 
can be used to decompose models into reusable macros that 
can be independently tested, and we generate composite 
models that combine a prior model from our lab describing 
extrinsic apoptosis (Albeck et al, 2008b; 2008a) with alter- 
native hypotheses about Bcl-2 family interactions from other 
investigators. Finally, we develop and calibrate an expanded 
cell-death model that spans the diversity of the multi-protein 
Bcl-2 family and draws on findings from leading biochemists 
in the field, depicting a unified, 'embedded together' mechan- 
ism for mitochondrial membrane permeabilization (Leber 
et al, 2010; Llambi et al, 2011). All models in this paper, along 
with the PySB source code and documentation, are available 
for sharing and further development at GitHub and the PySB 
website (http://pysb.org; Materials and methods). 



Results 

We chose Python as the language for PySB because of its 
widespread use in the computational biology community, 
support for object-oriented and functional programming, and 
rich ecosystem of mathematical and scientific libraries. At the 
outset, we determined that PySB should interoperate seam- 
lessly with BioNetGen (Faeder et al, 2009) and Kappa (Danos 
et al, 2007b), and thereby build on substantial investments in 
rule-based modeling. A PySB model consists of instances 
of a core set of classes: Model, Monomer, Parameter, 
Compartment, Rule, Initial and Observable, closely 
mirroring the form of BNGL and Kappa models. However, in 
PySB, the component declarations return software objects 
inside Python, allowing model elements to be manipulated 
pro grammatically. To simplify the process of writing rules and 
to maximize the syntactic match with BNGL, PySB redefines 
(overloads) some of the mathematical operators in Python to 
create a shorthand that resembles chemistry notation (Box 2). 
For example, in the context of a PySB rule definition, the ' + ' 
operator (which in other contexts represents mathematical 
addition) is used to enumerate a list of reactants or products. It 
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Box 2. PySB embeds a biological modeling language within Python. 

Existing computer languages developed for biological modeling (e.g., 
BNGL or SBML) use a specialized syntax to concisely encode the 
detailed specifications of biological models. However, such domain- 
specific languages (DSLs) lack many features found in general-purpose 
programming languages (functions, classes, loops, etc) that can be used 
for organizing complex code and making it more human-readable. They 
also often lack ancillary tools commonly found in general-purpose 
programming languages, such as testing frameworks or documentation 
generation support. Models written using PySB are programs in the 
Python programming language— they are not specialized file formats 
interpreted by modeling programs. When executed, a PySB model 
programmatically 'builds up' the elements of a rule-based model 
(molecule types, rules, parameters, etc) until the specification is 
complete; the Python model object that results can then be subjected 
to further analysis. Relative to a DSL, a traditional, object-oriented 
approach to building up a model in this way would require many extra 
lines of code to create and track objects. PySB streamlines the process 
of programmatically building models by overloading several Python 
operators ( + , », <>,(),%) to allow biological rules to be 
expressed in a chemistry-like syntax based on BNGL. In addition, the 
SelfExporter helper class allows models to be built up declaratively in 
BNGL-like fashion, further minimizing the required code. The overall 
result is a specialized language for biological modeling 'contained within' 
Python and implemented using Python operators (the Python package 
SymPy for symbolic mathematics uses a similar approach). Although this 
'syntactic sugar' streamlines the most common modeling use cases, in 
some advanced modeling scenarios, a more traditional programming 
syntax may be preferred; the streamlined syntax is thus entirely optional. 
The PySB syntax is described in detail in the Supplementary Materials. 



is not necessary to use overloaded PySB operators but it makes 
the models easier to write and understand (see 'PySB syntax' 
in the Materials and methods and the Supplementary 
Materials) . 

By way of illustration, consider a 'Hello World' program in 
PySB involving reversible binding of proteins L and R, each of 
which contains a single binding site s. A PySB program for this 
simple reaction has high overhead relative to the equivalent 
set of ODEs but serves to introduce the basic PySB syntax and 
show how it interoperates with other software such as BNG, 
the VODE integrator (Brown et al, 1989), and the Matplotlib 
plotting library (Hunter, 2007) . In the first block of the program 
(Figure 1A; block 1), a declaration of molecule types 
('monomers' in PySB) is followed by the forward and reverse 
rate parameters, kf and kr, and initial conditions for unbound 
L and R. The syntax for the reversible binding rule (which will 
be familiar to users of rule-based languages) reads as follows: 
when the proteins L and R both have empty binding sites s 
(e.g., L ( s = None ) ), they reversibly bind to form a complex 
that shares a single 'bond' (e.g., l(s = 1) % r(s = 1)), at 
rates kf and kr. This approach to naming binding sites (and 
calling interactions 'bonds,' without implying covalent mod- 
ification) is drawn from rule-based languages and makes it 
possible to describe molecules having multiple sites of 
interaction with different specificities, modifications, and 
occupancy states (e.g., the distinct binding sites on the TRAIL 
receptor for ligand and death-inducing signaling complex 
(DISC) adaptor proteins) . The 'Hello World' model concludes 
by designating an observable of interest, the complex lr. (For 
additional details on the syntax used in this model, see Box 2 
and the 'PySB syntax' section of the Supplementary Materials; 
a unified modeling language (UML) class diagram of the core 
PySB classes is also provided in Supplementary Figure S3.) 
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The second block of code in Figure 1 A defines a time range and 
calls the PySB function odesolve, which performs determi- 
nistic model simulation by invoking BNG (to generate the 
reaction network) and the numerical integrator VODE (see also 
Figure IB). Simulation results are returned as a matrix that is 
graphed using the plot command from Matplotlib (Figures 
1A, block 3 and B). 

Using macros to model recurrent biochemical 
actions 

The benefits of PySB become apparent only with more complex 
and interesting models in which programming constructs such 
as conditionals, loops, functions, classes, and modules are used 
to define reusable elements. The complexity of these elements 
can vary from a few species to multi-component cascades. 
Macros, reusable Python functions that are programmatically 
expanded into rules and reactions, define low-level biochem- 
ical actions such as 'catalyze,' 'bind,' or 'assemble,' mirroring 
the way we describe biological processes verbally. PySB 
currently contains 13 general-purpose macros covering rever- 
sible binding, catalytic modification, synthesis, degradation, 
and pore assembly. Users can generate other model-specific 
macros to implement new or uncommon mechanisms, thereby 
creating libraries of biochemical actions for subsequent 
modeling projects (we are currently adding new macros 
ourselves); phosphorylation/dephosphorylation cascades and 
loops are obvious candidates for such a library. 

As an example, the catalyze macro implements a mass 
action model of an enzyme-mediated biochemical transforma- 
tion (Figure 2 A) based on six user-specified arguments: the 
enzyme and its binding site for substrate, the substrate and its 
binding site for enzyme, the product, and a list of forward, 
reverse, and catalytic rate constants. Figure 2 A illustrates the 
application of catalyze to a reaction in which caspase-8 
cleaves Bid to form truncated Bid (tBid; see Box 1 for a 
description of the relevant biology) . Improved transparency is 
an important benefit of using macros in that they explicitly 
describe how elementary reactions are implemented. For 
example, catalyze invokes a two-step model of catalysis in 
which an enzyme-substrate complex is formed as an inter- 
mediate step: E + S <->ES->E + P. Some published models of 
apoptosis (Chen et al, 2007a; 2007b; Cui et al, 2008; Howells 
et al, 2010) use a pseudo-first-order, one-step approximation 
E + S -> E + P that has the merit of fewer parameters. However, 
depending on time scales and parameter values, there is a 
profound difference in the dynamics of one- and two-step 
catalysis: the one-step model, for example, makes the strict 
assumption that the enzyme always operates in its linear range 
and cannot be saturated. These differences are not apparent 
from the visual or existing verbal descriptions of the ODEs but 
instead require careful retrospective analysis. In contrast, 
PySB allows exploration of mechanistic differences simply by 
calling an alternative catalysis macro, catalyze_one_step 
(Supplementary Figure SIB). The benefit of macros is not only 
that the model is more concise (the catalyze macro call 
replaces two rules and four ODEs) but also that the difference 
between one- and two-step catalytic schemes is clearly declared 
and need not be inferred retrospectively. Because macros are 
tested programmatically, they also ensure correct instantiation 
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from pysb import * 

from pysb. integrate import odesolve 

from pylab import plot, linspace 

Model () 

# Declare the monomers 
Monomer('L' J ['s']) 
Monomer ('R' , ['s']) 

# Declare the parameters 
Parameter( ' kf ' , le-3) 
Parameter ( 1 kr 1 j le-3) 

# Declare the initial conditions 

Initial(L(s=None) J Parameter( ' L_0 ' , 100)) 
Initial(R(s=None) J Parameter( ' R_0 ' s 200)) 

# Declare the binding rule 

Rule( ' L_binds_R 1 } 

L(s=None) + R(s=None) <> L(s=l) % R(s=l), 
kf, kr) 

# Observe the complex 

Observable( ' LR' , L(s=l) % R(s=l)) 



B 



PySB 



External Tools 



© 



PySB 
declarations 



©odesolve( . . . ) 



# Simulate the model 
v ftime = linspace(0j 40) 
^[x = odesolve (model, time) 
(D plot (time, x['LR']) 



® plot(...)- 



Make BNGL 



Make ODEs 



4> 



Build Rxn Net 
(BNG) 



4> 



Integrate 
(VODE) 



1* 



Plot 
(Matplotlib) 



Result 



Rule( J L_binds_R J , 
L(s=None) + 
L(s=l) % R(s 



R(s=None) <> 
1), kf, kr) 



L(s) + R(s) 
kf , kr 



<-> L(s!l).R(s!l) 



begin reactions 

1 1,2 3 kf #L_ 

2 3 1,2 kr #L_ 
end reactions 



binds_R 

binds_R(reverse) 



dL/dt = -kf*L*R 
dR/dt = -kf*L*R 
dLR/dt = kf*L*R 



kr*LR 
kr*LR 
kr*LR 




Figure 1 Creation and simulation of a 'Hello World' model in PySB. (A) Creation and deterministic simulation of a model using PySB. The call to Model ( ) creates 
the pysb . core . Model object to which all subsequently declared components are added. The first block of code declares the molecule types, parameters, initial 
conditions, reversible reaction rule, and observable necessary for modeling and simulating the reversible binding of proteins l and r. The second block of code calls the 
odesolve function from the pysb . integrate module to generate and integrate the ODEs. The third block plots the simulated time course using the Matplotlib 
plot command. Numbers associated with the code blocks identify the correspondences between the code and the control flow shown in B. (B) Control flow for an ODE 
simulation of a PySB model. The columns 'User,' 'PySB', and 'External Tools' indicate the locus of control of each step in the process (boxes). The 'Result' column shows 
the result of each individual step: the gray box indicates results of steps that are internal to the call to odesolve, whereas the other results are visible to the user at the 
top level. After declaring the model elements as in A, the user calls odesolve, which generates the corresponding BNGL for the model, invokes BNG externally on the 
generated BNGL code to create the reaction network, parses the reaction network to generate the corresponding set of ODEs, and calls an external integrator (VODE) to 
generate the trajectories. The trajectories are returned to the user as a NumPy record array, where they are visualized with a call to the plot function from Matplotlib. 



of forward, reverse, and catalytic reactions, an important 
benefit because failure to implement these processes correctly 
is remarkably common (see below and Supplementary Note) . 

The power of macros is most evident with complex 
biochemical actions. For example, the assemble_pore_ 
sequential macro (Figure 2B) implements sequential assem- 
bly of a membrane-associated pore from multiple identical 
subunits. It was written to explore how Bax and Bak associate in 
the mitochondrial outer membrane to form the pore that 
translocates cytochrome c and Smac into the cytosol during 
MOMP (Youle and Strasser, 2008). The arguments to assem- 
ble_pore_sequential are the identity of the pore-forming 
protein and its two binding sites, the maximum number of 
proteins in a pore and a table containing rate constants for each 
assembly step. In vitro experiments suggest that Bax and Bak 
assemble by sequential addition of subunits (Martinez-Caballero 
et al, 2009), but the precise size of an active MOMP pore is 
unknown. Exploring the effects of changing the number n of 
subunits in a pore requires creation and modification of n-1 
BNGL rules or n ODEs but only one value in the PySB 
assemble_pore_sequential macro. 

Many biochemical processes are controlled by families of 
proteins that have overlapping binding specificities. In 
apoptosis, MOMP is regulated by ~20 pro- and anti-apoptotic 
Bcl-2 family proteins that have a range of affinities for each 
other and bind in various arrangements (Box 1). Modeling the 
binding of any two Bcl-2 proteins is simple, but managing 
equations or rules for all possible binding reactions is much 
harder. The bind_table macro uses a simple tabular 
representation to model interactions among members of 



multi-protein families (Figure 2C). The first argument to the 
macro is a table (a list of lists in Python) in which row and 
column headers identify pairs of interacting proteins and each 
table entry contains binding constants or the value None to 
indicate that there is no measurable interaction. The second 
and third arguments specify the binding site names for row and 
column species, respectively. In addition to being concise (a 
single bind_table call for the Bcl-2 proteins generates 28 
rules and 41 ODEs) , the tabular format highlights relationships 
between proteins by grouping them into functional classes; 
new family members can be introduced simply by adding rows 
and columns. A bind_table is therefore a simple compu- 
table representation of the 'binding codes' that have been 
created by others to summarize structural and biochemistry 
studies on Bcl-2 family members (Chen et al, 2005; Kuwana 
et al, 2005; Certo et al, 2006) . By changing the first argument in 
the bind_ table call, it is straightforward to explicitly 
substitute one set of binding data for another, a useful feature 
for exploring differences in published binding codes or for 
modeling the behavior of mutant proteins (Fire et al, 2010; 
Debartolo et al, 2012). Models that use macros such as 
bind_table naturally acquire a 'self-documenting' character 
that minimizes the need for additional explanatory description 
(see Figures 4B and 5 A for examples of bind_table in the 
context of models of MOMP initiation) . 

Modules and model reuse 

Macros abstract biochemical reactions at a fairly low level of 
detail involving a few proteins, but Python also supports a 
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Example macro call 



Implementation 
BNGL rules 



ODEs 



Macro Arguments 
1 "~1 2 3 4 5 6 1 

catalyze(C8, 'bf, Bid(state='U' ), 'bf, Bid(state='T' ), klist) 

1 : Enzyme (Caspase-8) 4: Substrate-binding site name 



2: Enzyme-binding site name 

3: Substrate (full length, cytosolic Bid) 

Supplementary Figure S1A 



5: Product (T for truncated, cytosolic Bid) 
6: List of rate parameters 



C8(bf) + Bid(bf,state~U) <-> C8(bf !l).Bid(bf !l,state~U) kf, 
C8(bf !l).Bid(bf !l,state~U) -> C8(bf) + Bid(bf 3 state~T) kc 



C8: ds0/dt 

Bid: dsl/dt 

C8_Bid: ds2/dt 

tBid: ds3/dt 



kc*s2 - kf*s0*sl + kr*s2 
-kf*s0*sl + kr*s2 
-kc*s2 + kf*s0*sl - kr*s2 

kc*s2 



kr 



B 



Example macro call 



BNGL rules 
ODEs 



Macro 



~ir 



Arguments 



1 



assemble_pore_sequential(Bax, ' si 
1: Subunit name 

2, 3: Sites for binding neighboring subunits 

5 Rules 

6 ODEs 



~5 1 

, 'si', 6, ktable) 

4: Maximum number of subunits in pore 

5: Table of rate parameters 



Example macro call 



Macro 



Arguments 



BNGL rules 
ODEs 





Bcl2, 


BclXL, 


BclW, 


Mcll, 


Bfll], 


[Bid, 


66, 


12, 


10, 


10, 


53], 


[Bim, 


10, 






10, 


73], 


[Bad, 


11, 


> 


60, 


None, 


None] , 


[Bik, 


151, 


10, 


17, 


109, 


None] , 


[Noxa, 


None, 


None, 


None, 


19, 


None] , 


[Hrk, 


None, 


92, 


None, 


None, 


None] , 


[Puma, 


18, 




25, 


10, 


59], 


[Bmf, 


24, 








None] ] , 


'bf' , 


'bf, kf= 


=le-3]] 








2 


3 


4 









1 : Bind table data (molecules and rates) 
2: Binding site for row-group proteins 

28 Rules 

41 ODEs 



3: Binding site for column-group proteins 
4: Default association rate 



Figure 2 Three examples of mechanistic abstractions using macros. Full implementation of all macros can be found in the macros . py file in the PySB source code 
online (Materials and methods). (A) catalyze. The example call shows how the catalyze macro is called to add a catalytic reaction in which active caspase-8 
(C8) binds to untruncated Bid (Bid ( state = X U' )) to yield tBid (Bid ( state = X T' ) ). Rate parameters (forward, reverse, and catalytic) are provided in 
the list v klist ' . The 'Basic implementation' in Supplementary Figure S1A shows the Python source code for a simplified version of the catalyze macro. 
Execution of the catalyze macro leads to the creation and addition of two rules to the model, which, when converted into BNGL, take the form shown below. 
Generation of the reaction network via BNG then results in the system of four ODEs shown at bottom. (B) assemble_pore_sequential models the assembly 
of a pore in a sequential fashion in which monomers bind to form dimers, dimers bind monomers to form trimers, trimers bind monomers to form tetramers, etc. Pores of 
size three (trimers) and above have a closed, ring-shaped topology, reflecting the variable structure and stoichiometry for the Bax pore suggested by recent in vitro 
experiments (Martinez-Caballero et al, 2009). The maximal size of the pore is given by the fourth argument to the macro (i.e., 6). With the parameters shown in the 
example, the execution of the macro results in five rules (for binding of monomers to monomers, monomers to dimers, monomers to trimers, etc) and six species and 
ODEs (monomers through hexamers). Pores with greater stoichiometry can be modeled simply by changing the pore size argument in the macro call. (C) 
bind_table is used to concisely represent combinatorial binding among two related groups of molecules. In the example call, the species in the column headers are 
the five known anti-apoptotic Bcl-2 proteins, whereas the row headers are various pro-apoptotic BH3-only proteins. The table entries represent the dissociation constants 
for binding between each pair of proteins drawn from in vitro measurements, given in units of nanomolar (Willis etal, 2005; Certo etal, 2006); the Python constant None 
indicates that no binding occurs. (In place of a dissociation constant, a table cell may alternatively contain a pair of explicit association and dissociation rates.) The names 
of the binding sites for the row-group and column-group proteins (i.e., v bf ' ) are given as the second and third arguments. The final argument (i.e., kf = le-3) 
specifies a default association rate to be applied to all reactions, given in units of nM " 1 s" 1 . The execution of the bind_table call results in the instantiation of 28 
reversible binding rules, each with the given association rate and a dissociation rate calculated from the dissociation constant provided in the table entries; this further 
expands to 41 ODEs. 
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wide variety of methods for reusing more complex model 
elements. We have found three to be particularly useful: 
(1) Instance Reuse, for making small changes to an existing 
model; (2) Module Reuse, for programmatically composing a 
model from reusable pieces or modules; and (3) Class Reuse, 
for models involving combinatorial variation of several 
independent features. Of these, Instance Reuse is the simplest, 
entailing programmatic duplication of a previous model and 
explicit specification of new and modified elements. Instance 
Reuse proved to be an appropriate way to update the PySB 
version of EARM 1.0 (Albeck et al, 2008a) to include synthesis 
and degradation reactions (Figure 3 A) . This approach replaces 
conventional 'cut-and-paste' copying and editing of ODEs or 
BNGL rules. Reuse is achieved by cloning the old model into a 
new model and then explicitly declaring the elements that are 
added or modified, making the changes easy to understand 
and track. 

Module Reuse involves the separation of a model into 
independent elements ('motifs' or 'modules'; see also 
Figure 7) that are written as callable subroutines in Python. 
It is not necessary for a biological process to be modular in a 
functional sense for modularization in PySB to be advanta- 
geous. Building a model from subroutines enables a 'mix-and- 
match' approach in which a subset of the interactions in a 
model is subjected to revision or re-examination, whereas the 
rest remain the same. For example, we divided EARM 1.0 into 
three modules each involving self-contained blocks of PySB 
code for: (1) reactions from ligand-death receptor association 
to binding of DISC components; (2) interactions among Bcl-2 
family members controlling MOMP; and (3) the cascade of 
reactions involving initiator and effector caspases and their 
immediate regulators (Figure 3B; Box 1). A series of papers 
examining alternative models of MOMP have been published 
by multiple groups (Chen et al, 2007a, b; Cui et al, 2008; 
Howells et al, 2010), but MOMP has most commonly been 
studied in isolation from reactions occurring upstream and 
downstream. However, it has been shown that multi-protein 
cascades do not exhibit the same behavior in isolation as when 
they are part of larger networks (Del Vecchio et al, 2008; Chen 
et al, 2009). One of the primary aims of modeling signal 
transduction is to contextualize molecular mechanisms by 
embedding them in a network context. Thus, studies of 
extrinsic apoptosis would benefit from models in which 
alternative hypotheses for MOMP regulation are embedded 
in a more complete reaction pathway. Using conventional 
modeling tools, it is challenging to add MOMP 'mini-models' 
to upstream and downstream reactions (Albeck et al, 2008b) . 
In contrast, in PySB, this type of composition is simple: we 
have written a PySB program in which any of 15 models of 
MOMP are called up, along with common rec_to_bid and 
pore_to_parp modules to create 15 fully functioning hybrid 
models of extrinsic apoptosis (Figure 3B). The upstream and 
downstream reactions have also been modeled in several 
different ways by others (Rehm et al, 2006; O'Connor et al, 
2008; Fricker et al, 2010; Neumann et al, 2010; Schleich et al, 
2012), and it would be straightforward to use Module Reuse to 
combine different proposals for rec_to_bid and pore_ 
to_parp with different MOMP modules. 

Class Reuse is a third and more sophisticated approach that 
exploits the class inheritance mechanism in Python. Common 
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model features are coded in a base class, and model variants 
are written as 'child' classes of the base class, able to inherit 
code from the base class directly with or without program- 
matic modification. Code from multiple variants can then be 
combined by further inheritance from more than one of these 
classes. For example, we used Class Reuse to model the effects 
of reaction compartmentalization on interactions among pro- 
apoptotic Bcl-2 proteins at the mitochondrial membrane 
(Figure 3C). In one case, we assumed that reactions took 
place in two well-mixed reaction compartments corresponding 
to cytosol and membrane (Figure 3C, TwoCpt), and in the 
second, we assumed that each mitochondrion constituted a 
distinct reaction compartment (MultiCpt). In addition, we 
independently explored different reaction topologies involving 
the Bcl-2 proteins tBid and Bax. Both topologies (Topol and 
Topo2) include the translocation of tBid and Bax to 
membranes but only the second topology (Topo2) incorpo- 
rates activation of Bax by tBid. We used inheritance to 
automate creation of four different models having different 
compartmentalization schemes and reaction topologies (e.g., 
TwoCpt_Topol). The notable feature of Class Reuse is that 
model variants are created and combined over multiple 
independent 'axes' — in this example, compartmentalization 
and protein-interaction topology — transparently and with no 
duplicated code. 



Integration with the Python ecosystem 
and external modeling tools 

The iterative process of model development is dramatically 
accelerated when tools for model creation, simulation, 
analysis, and visualization are integrated. Many commercial 
and academic software packages, including Mathematica 
(Wolfram Research, Inc., 2010) and MATLAB (Mathworks, 
2012), provide integrated tools for equation-based models but 
are unwieldy to use with rule-based or programmatic 
approaches because models must be exported and imported 
using SBML (Hoops et al, 2006; Maiwald and Timmer, 2008). 
At the same time, rule-based model editors such as RuleBender 
for BNGL (Smith et al, 2012) and RuleStudio for Kappa 
(https://github.com/kappamodeler/rulestudio) facilitate 
development of rule-based models but do not incorporate 
tools for data analysis, parameter fitting, and symbolic math. 
Simply by virtue of being written in Python, PySB interacts 
natively with a large and growing library of open-source 
scientific software such as NumPy, SciPy, SymPy, and 
Matplotlib (Table I). Models written using PySB can also 
exploit Python tools for documentation generation (sphinx) 
and for unit testing (unittest, nose, and doctest), both 
of which we used extensively in creating the models of 
extrinsic apoptosis described below. 

To interface PySB with BNG and Kappa, which are not 
implemented in Python, we wrote Python 'wrapper' libraries, 
providing access to agent-based simulation, static analysis, 
and visualization. The wrappers also manage the syntactic 
differences between BNGL and Kappa, allowing either to be 
used for the same PySB model. Models can be trivially 
exported in BNGL format for use with established 'all-in-one' 
tools that support BNGL, such as V-Cell (Moraru et al, 2002). 

Molecular Systems Biology 2013 7 



PySB: programming biological models 
CF Lopez et al 



CD 
C/) 

CD 
DC 



B 



earm_1_0 




earm_1_3 









Compartmentalization 



Interaction topology 



eanm_l_0. py 



Model () 

<Components, macros, etc> 



earm_l_3.py 



import earm_l_0 

Model ( base=earm_l_0 . model) 

# Add/modify components 
synthesize_degnade_table( [ . . . ] ) 




compartment jzlasses . py 



rec_to_bid 



pore_to_parp 



model_variant 



earmjnodules. py 



def rec_to_bid() : 

# Upstream module 

<Components, macros, etc> 

def pore_to_parp() : 

# Downstream module 

<Components, macros, etc> 



model_variant . py 



import earmjnodules 
Model () 

earmjnodules. rec_to_bid( ) 

# Implement variant MOMP model 

<Components, macros, etc> 

earmjnodules . pore_to_parp( ) 



3- 

(D 

DC 



class CptBase(object) : 

def tBid_activates_Bax(self ) : 

# Default implementation 

<Components, macros, etc> 

class TwoCpt(CptBase) : 

def translocate_tBid_Bax(self ) : 

# Specific implementation 

<Components, macros, etc> 

class MultiCpt(CptBase): 

def translocate_tBid_Bax(self ) : 

# Specific implementation 

<Components, macros, etc> 



topology jzlasses . py 



class Topol(object) : 

def build_model(self ) : 

self .translocate tBid 



Bax() 



class Topo2(object) : 

def build_model(self ) : 

self .translocate_tBid_Bax( ) 
self .tBid_activates_Bax( ) 



twocpt_topol . py 



from compartment_classes import TwoCpt 
from topology_classes import Topol 

TwoCpt_Topol = type ( ' TwoCpt_Topol 1 s 

(TwoCpt, Topol), {}) 
model_builder = TwoCpt_Topol( ) 
model_builder . build_model( ) 



Figure 3 Three approaches to model reuse. Boxes with gray tabs represent Python modules/files. Statements marked 'Reuse' identify the point of reuse of previously 
created model code. Background highlighting indicates correspondence between elements in the reuse schematic and the PySB code. (A) Direct reuse and subsequent 
modification of pre-existing model code. A pre-existing model is declared in its own Python file (earm_l_0 . py). The extending model in the file earm_l_3 . py 
(representing a later version) imports and duplicates the model object from earm_l_o . py and subsequently adds a list of synthesis and degradation reactions. (B) 
Reuse of modules using macros. Macros instantiating the components for the upstream (rec_to_bid) and downstream (pore_to_parp) portions of the 
extrinsic apoptosis pathway are placed in a Python file, earmjnodules .py. Variant models differing only in the reaction topology for MOMP initiation (e.g., 
mode l_vari ant .py) are then created by invoking these macros for the shared upstream and downstream elements. (C) Reuse and recombination of model 
elements through class inheritance. A shared implementation of the compartmentalization-independent reactions for Bax activation (i.e., tBid_activates_Bax) 
is contained within the base class CptBase. Alternative compartmentalization strategies are implemented in the child classes TwoCpt and Multicpt, which 
separately implement the compartmentalization-dependent reactions for tBid and Bax translocation (i.e., translocate_tBid_Bax). Because these classes 
inherit from CptBase, they acquire the implementation of tBid_activates_Bax, representing a point of reuse. Alternative protein-interaction topologies are 
implemented within the build_model function in the two classes Topol and Topo2. Models with either of the compartmentalization implementations (e.g., 
TwoCpt) and either of the interaction topologies (e.g., Topol) can then be created dynamically by inheriting from the appropriate classes, representing an additional 
point of reuse. Readers familiar with the concept of polymorphism from object-oriented programming will note that calling the build_model method on any of the 
hybrid classes will polymorphically refer to the correct implementations in the parent classes. 
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Albeck Group Shen/Howells group 



B 



Albeck 11b 
Albeck 11c 

1 

Albeck 11d 

I 

Albeck 11e 

1 

Albeck 11f 



Lopez Group 



Chen Biophys. J. Chen FEBS "Indirect" 
Chen FEBS "Direct" 

i 

Cui "Direct" 

I 

Cui "Direct 1" 

I 

Howells Cui "Direct 2" 



Lopez "Indirect" Lopez "Direct" Lopez "Embedded" 



dAct 

dt ' 
dEna 
dt ' 
dlnBax 
dt " 
dBcl2 
dt ' 

dBax 
dt ' 
dActBcl2 
dt 

dEnaBcl2 
dt ' 
dMAC 
dt " 



= -kBH3-Bd2 ■ Act ■ Bcl2 + kr B H3Bd2 • ActBcl2 
-- -k B H3.Bd2 ■ Bcl2 ■ Ena + kr B H3Bd2 ■ EnaBcl2 

2 —klnBax ■ Act ■ InBaX + k Bax " BaX 

z -k B H3.Bci2 ■ Act ■ Bcl2 + kr B H3Bd2 ■ ActBcl2 

- k BH 3.Bcl2 ■ Bcl2 ■ Ena + kr B H3Bd2 • EnaBcl2 
- k InBax ■ Act ■ InBax - k a ■ Bax 4 - k Bax ■ Bax + 4 • kr Q ■ MAC 
~- k B H3-Bci2 • Act • Bcl2 - kr BH3Bc i2 ■ ActBcl2 
z k BH 3-Bci2 ■ Bcl2 ■ Ena - kr B H3Bd2 • EnaBcl2 
-- k a ■ Bax 4 - kr Q ■ MAC 



def chen_f ebs_direct(do_pore_assembly=True) : 

# One-step "hit-and-run" activation of Bax by tBid 
catalyze_one_step_reversible( 

Bid(state='T', bf=None), 
Bax(bf =None, **inactive_monomen) , 
Bax(bf=None, **active_monomer) , 
[le-3, le-3]) 

# Bcl2 binds tBid and Bad (a sensitizer) but not Bax 
bind_table([[ Bcl2], 

[Bid(state='T'), (le-4, le-3)], 
[Bad(state='M') J (le-4, le-3)]]) 

if do_pone_assembly : 

assemble_pore_spontaneous(Bax(state= ' A' , bf =None) , 
[4e-3, le-3]) 



def cui_dinect( ) : 

# Build on the direct model from Chen et al. (2007) 

# FEBS Lett, (excluding pore assembly) by: 
chen_febs_direct(do_pore_assembly=False) 



# 1. Overriding some parameter values, 
one_step_BidT_BaxC_to_BidT_BaxA_kf .value 
bind BidT Bcl2 kf. value = 0.001 



0.0005 



# 2. Adding a Bad-for-Bid displacement reaction, 
displace_reversibly(Bad(state='M' ), Bid(state='T' ), Bcl2, 

[0.0001, 0.001]) 

# 3. Adding simplified MAC formation (Bax dimerization) 
assemble_pore_sequential(Bax(state= ' A' , bf=None), 2, 

[[0.0002, 0.02]]) 

# 4. Adding synthesis and degradation reactions 
Bax2 = Bax(sl=l, s2=None) % Bax(sl=None, s2=l) 
synthesize_degrade_table( 

[ [Bax(bf =None, **inactive_monomer) , 
[Bax(bf =None, **active_monomer) , 
[Bid(state='T', bf=None), 
[Bcl2(bf=None), 

[Bid(state='T', bf=l) % Bcl2(bf=l), 
[Bad(state= 'NT , bf=None, serine='U'), 
[Bad(bf=l) % Bcl2(bf=l), 
[Bax2, 



0.06, 


0.001], 


None, 


0.001], 


0.001, 


0.001], 


0.03, 


0.001], 


None, 


0.005], 


0.001, 


0.001], 


None, 


0.005], 


None, 


0.0005]]) 



Figure 4 Refactoring of published models into PySB. (A) Relationships between the models examined in this paper. The 'Albeck Group' incorporates a series of 
incrementally expanded models shown in Figure 1 1 of Albeck et al (2008b); the 'Shen/Howells Group' incorporates models from three papers from the research group of 
Shen and colleagues (Chen et al, 2007a; 2007b; Cui etal, 2008) and a derivative model from Howells et a/ (2010); the 'Lopez group' includes three expanded models 
introduced in this paper. The arrows indicate that one model has been derived or extended from a prior model and point in the direction Base Model -> Derivative Model. 
(B) The 'Direct' model from Chen et al (2007b) in its original ODE-based representation. (C) Conversion of the Chen 'Direct' model to a PySB module. The execution of 
the chen_f ebs_direct function results in rules that exactly reproduce the ODEs shown above (the molecule type Bad in the PySB function corresponds to the 
generic enabler species Ena in the original equations; Bid corresponds to the generic activator Act). The macro catalyze_one_step_reversible 
implements the two-reaction scheme E+S^>E+ P, P^S; assemble_pore_spontaneous implements the order-4 reaction 4 x subunit pore. The 
bind_table macro is illustrated in Figure 2C. (D) Model extension in PySB. Module Reuse (Figure 3B) was used to implement the 'Direct' model from Cui et al 
(2008) as an extension of the prior 'Direct' model from Chen et al (2007b) shown in Figure 4C. Invocation of the PySB function chen_f ebs_direct incorporates 
the elements of the original Chen et al (2007b) model; subsequent statements specify the modifications and additions required to yield the derived model from 
Cui efa/(2008). 



Table I Integration with external modeling tools 



Tool 


Reference 


Interface 


Description (relevance to PySB) 


NumPy 


Oliphant, 2007 


Python 


Efficient array and matrix operations 


SciPy 


Oliphant, 2007 


Python 


Scientific algorithms, e.g., ODE integration, statistics, and optimization 


SymPy 


SymPy Development Team, 2012 


Python 


Symbolic manipulation of mathematical expressions 


Matplotlib 


Hunter, 2007 


Python 


Plotting and other data visualizations 


Graphviz 


Gansner and North, 2000 


Python 


Layout and rendering of node-edge graphs 


BNG 


Faeder et al, 2009 


Wrapper 


Translation of rules to a reaction network; stochastic simulation 


Kappa 


Danos et al, 2007b 


Wrapper 


Stochastic simulation; visualization and analysis of rules models 


SBML 


Hucka, 2003 


Export 


Compatibility with SBML tools 


Mathematica 


Wolfram Research, Inc., 2010 


Export 


General-purpose scientific computing 


MATLAB 


Mathworks, Inc., 2012 


Export 


General-purpose scientific computing 
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PySB can export models as systems of ODEs in formats for 
SBML (Hucka, 2003), MATLAB, Mathematica, or Potters- 
Wheel (Maiwald and Timmer, 2008). Finally, to facilitate 
unique identification of model components, we added a light- 
weight annotation capability that allows any model element 
(including macros and modules) to be tagged with identifiers 
from external databases using subject-object-predicate triples 
compatible with MIRIAM (Le Novere et al, 2005). The net 
result is a software environment that combines the flexibility 
of a general-purpose scientific computing package with 
programmatic and rule-based modeling tools and an open- 
source code base. 



EARM version 2.0: a family of models of extrinsic 
apoptosis and MOMP 

To explore the ability of PySB to model the latest molecular 
data on apoptosis while also building on previous work, we 
used macros and Module Reuse to construct a family of cell- 
death models involving 15 different modules for MOMP 
regulation. Seven of the MOMP modules were previously 
published by the research group of Shen and colleagues (Chen 
et al, 2007a, b; Cui et al, 2008), one module extends a Shen 
group model (Howells et al, 2010), five modules are from the 
work of Albeck et al (2008b), and three modules are entirely 
new and incorporate more complete sets of interactions among 
Bcl-2 proteins (Figure 4A) . The three new modules are derived 
from word models in recent studies from Green and Andrews 
that unify previously competing mechanisms of pore forma- 
tion (Billen et al, 2008; Leber et al, 2010; Llambi et al, 2011). 
Each of the 15 modules was instantiated in PySB as a distinct 
subroutine that can be called and analyzed in the context of a 
receptor-to-caspase pathway. The set of 15 MOMP modules is 
by no means complete, and several noteworthy models of 
extrinsic apoptosis and MOMP (Bentele et al, 2004; Bagci et al, 
2006; Legewie et al, 2006; Rehm et al, 2009; Dussmann et al, 
2010) have not yet been coded in PySB. However, because our 
objective is to explore model reuse and composition using 

Table II Summary of models in EARM 2.0 

Model name a ID (full/MOMP) Reference 



Lopez Embedded 


Mla/b 


This paper 


Lopez Direct 


M2a/b 


This paper 


Lopez Indirect 


M3a/b 


This paper 


Albeck lib 


M4a/b 


Albeck et al, 2008b 


Albeck 11c 


M5a/b 


Albeck et al, 2008b 


Albeck lid 


M6a/b 


Albeck et al, 2008b 


Albeck lie 


M7a/b 


Albeck et al, 2008b 


Albeck 11 f 


M8a/b 


Albeck et al, 2008b 


Chen 2007 Biophys J 


M9a/b 


Chen et al, 2007a 


Chen 2007 FEBS Direct 


MIOa/b 


Chen et al, 2007b 


Chen 2007 FEBS Indirect 


Mlla/b 


Chen et al, 2007b 


Cui Direct 


M12a/b 


Cui et al, 2008 


Cui Direct 1 


M13a/b 


Cui et al, 2008 


Cui Direct 2 


M14a/b 


Cui et al, 2008 


Howells 


M15a/b 


Howells et al, 2010 



PySB, we limited ourselves to the 15 MOMP-focused examples 
described above. We collectively denote the resulting set of 30 
variant models (15 models only of MOMP plus 15 models of 
extrinsic apoptosis incorporating the MOMP modules) as 
Extrinsic Apoptosis Reaction Model version 2.0 (EARM 2.0); 
the models are summarized in Table II and are available as a 
Python package with source code downloadable from GitHub 
(Materials and methods). 

While porting existing models into PySB, we observed that 
several published ODE networks contained one or more errors 
relative to their verbal or graphical descriptions in the original 
paper (Cui et al, 2008) (see Supplementary Note). We were 
unable to discern whether the errors in the published ODE 
networks represent genuine mathematical errors or merely 
transcription errors made in the process of converting 
computer models to text. Even when we used the ODE 
networks as published, we found cases in which we were 
unable to reproduce the results described in the figures. Our 
own previous work was not entirely free of this problem: we 
could not reproduce the simulation results in Figure 11 of 
Albeck et al (2008b) without access to MATLAB source code 
that was inadvertently omitted from the original publication. 
Our aim is not to criticize these papers but instead to 
emphasize that the current practice of maintaining different 
forms of a model for the purpose of simulation, illustration, 
and publication is highly problematic. The lists of equations 
included as supplementary materials in most modeling papers 
are particularly troublesome because they exist independently 
of the simulation model and the two tend to deviate. These 
problems can be addressed by using electronic formats for 
model exchange with a single master from which all other 
versions are derived (Hucka et al, 2003; Waltemath etal, 2011). 
As an electronic format for models, PySB complements XML- 
based formats such as SBML in that macros, modules, and 
other high-level abstractions make model structure more 
intelligible than SBML alone. In addition, modeling biochem- 
ical processes by reusing previously validated macros elim- 
inates 'bookkeeping' errors such as those we identified in 
published MOMP models. To ensure that the reinstantiated 



MOMP-only (Mnb) b Full apoptosis (Mna) c 
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c Full apoptosis variants are identified as Mna, e.g., Mia for the full-apoptosis version of 'Lopez Embedded'. 
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models reproduced the behavior of the originally published 
versions, we wrote a series of unit tests using the Python 
modules unittest and nose. The tests guarantee that the 
reinstantiated models reproduce validated states, despite 
translation into PySB. Further details on our approach to unit 
testing can be found in the online documentation (Materials 
and methods). 

The structure and origin of the MOMP models are easier to 
understand using PySB than the underlying sets of ODEs. This 
can most easily be seen by comparing the ODE and PySB 
versions of a model from Chen et al (2007b) (Figures 4B and 
4C) . The original model is relatively simple (only seven ODEs), 
but understanding the precise mechanism for MOMP requires 
careful inspection of each equation. By comparison, the PySB 
model exploits macros to make the mechanisms transparent: 
single-step catalysis, combinatorial binding, and pore assem- 
bly. Many of the 15 MOMP models in EARM 2.0 represent 
incremental extensions of earlier models (this is particularly 
true of the models from Howells et al (2010) and Chen et al 
(2007a, b), as well as the five models from Albeck et al (2008a, 
b); Figure 4A). The authors of these models proceeded by 
duplicating ODEs from previous models or papers and then 
adding new species or reactions as required: e.g., the three 
models of Cui et al (2008) are derived directly from the 'direct' 
model of Chen et al (2007b), whereas the model of Howells 
et al (2010) is based on an earlier model from Chen et al 
(2007a). However, the process of renaming species and 
variables in the derived models makes it difficult to verify 
that each variant correctly recapitulates the structure of the 
original model as claimed. For example, in Cui et al (2008), the 
authors stated simply that the 'direct model' was 'mainly 
based' on the earlier work of Chen et al (2007b), but we found 
that there were several important additions and modifications 
in the derived model, including addition of displacement, 
synthesis and degradation reactions, and a change in the 
MOMP pore from a Bax tetramer to a Bax dimer. Inspection of 
the PySB source code for the direct model of Cui et al (2008) 
(Figure 4D) makes these differences explicit by calling a 
subroutine for the earlier chen_f ebs_direct model 
(Figure 4C) and adding only the new reactions. 

PySB Module Reuse facilitated the process of embedding 
each of the 15 models of MOMP within the context of receptor- 
proximate reactions (ligand binding to Bid cleavage) and 
downstream reactions creating 'Full Apoptosis' and 'MOMP- 
only' versions (summarized in Table II; see also Materials and 
methods). We are currently developing additional apoptosis 
modules (e.g., alternative topologies for receptor activation 
and DISC formation) that will soon be part of the EARM 
repository; other researchers can also 'fork' the code on 
GitHub and contribute their own additions. This should allow 
a cumulative and distributed approach to model development 
and comparison. 



Embedded together: an updated and expanded 
MOMP model 

The EARM 2.0 extrinsic apoptosis model incorporating the 
'Lopez Embedded' MOMP module variant, denoted EARM 2.0- 
Mla for short (Table II), implements a mathematical 
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interpretation of recent experimental findings from Andrews 
(Billen et al, 2008; Leber et al 2010) and Green (Llambi et al 
2011) and differs significantly from previously published 
models of MOMP (Figure 5) . Interactions among Bcl-2 family 
members occur at the mitochondrial membrane rather than in 
the cytosol (Lovell et al 2008), and anti-apoptotic proteins are 
able to bind both the pore-forming proteins, such as Bax and 
Bak, and a larger family of BH3-only Bcl-2 family members, 
thus serving as dominant-negative effectors (Billen et al, 2008; 
Leber et al, 2010) . This is also consistent with a recent 'Unified 
Model' by Green and coworkers demonstrating both 'direct' 
and 'indirect' modes of action by the anti-apoptotic Bcl-2 
proteins (Llambi et al, 2011) (Box 1). The overlapping binding 
specificities implied by this model are summarized in a 
bind_table call that includes the key effector for extrinsic 
apoptosis (Bid), two BH3-only sensitizers (Bad and Noxa), two 
pore-forming effectors (Bax and Bak), and three anti-apoptotic 
proteins (Bcl-2, Bcl-XL, and Mcl-1), along with affinity data 
obtained from in vitro experiments (Willis et al, 2005; Certo 
et al, 2006) (Figure 5A) . There is some doubt about whether 
peptide-based affinity measurements are directly relevant to 
protein-protein interactions occurring on the membranes of 
living cells, and the bind_table macro makes it straightfor- 
ward to experiment with different values (Figure 2C). EARM 
2.0-Mla also assumes auto-activation of Bax (and Bak), which 
has been demonstrated in multiple experimental contexts 
(Tan et al, 2006; Gavathiotis et al, 2010). 

Our previously published EARM 1.0-1.4 models (Albeck 
et al, 2008a, b; Spencer et al, 2009; Aldridge et al, 2011; Gaudet 
et al, 2012) assumed that the all-or-none quality of MOMP 
arose from the ability of Bcl-2 to bind Bax monomers, dimers, 
and tetramers (Albeck et al, 2008b). However, subsequent 
immunoprecipitation experiments failed to support the exis- 
tence of such higher-order hetero-oligomers (Kim et al, 2009). 
To determine whether the updated reaction topology in EARM 
2.0-Mla can reproduce MOMP dynamics measured in single 
TRAIL-treated HeLa cells using Forster resonance energy 
transfer reporter proteins (Spencer et al, 2009), we fitted it to 
data using the simulated annealing algorithm in SciPy 
(Materials and methods). We found that EARM 2.0-Mla had 
as good a fit to data as previous models (Figure 6), and we 
therefore judge it to be superior to our earlier EARM 1.0 model 
based simply on better correspondence with prior knowledge. 
The fitting exercise also demonstrated that Python numerical 
tools can efficiently simulate and calibrate PySB models 
(parameter estimation functions are included in the EARM 
2.0 Python package). 



Discussion 

In this paper, we describe the development and use of PySB, a 
framework for creating, managing, and analyzing biological 
models in which models are full-fledged Python programs. 
PySB modules and macros generate BNGL or Kappa rules, 
which can be converted into mathematical equations. This 
hierarchical process is analogous to the way in which 
programs are written in a high-level language such as C + + 
and converted into microprocessor code by the compiler. This 
complexity is hidden from PySB users who work with macro 
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Figure 5 Representations of the Bcl-2 interaction topology in the EARM 2.0-M1b ('Lopez Embedded') model. The Bcl-2 interaction model consists of five basic 
mechanistic elements or 'motifs'— tBid/Bax/Bcl-xL translocation, activation of Bax and Bak by tBid, Bcl-2 family binding, Bax/Bak auto-activation, and pore assembly— 
that are highlighted in A, B, and C according to the colors in the legend. (A) PySB source code for the model, edited for brevity. (B) Simplified, manually drawn 
representation. (C) The full reaction network, generated from the PySB model using the PySB render_reactions tool. Rectangles represent species, circles 
represent reactions, lines represent reactions with the solid arrowhead representing the nominal forward direction, and the empty arrowhead (for reversible reactions 
only) representing the reverse direction. Catalytic reactions are depicted with a boxed arrow pointing from the catalyst to the reaction circle (species for enzyme- 
substrate complexes are omitted for clarity). (D) Kappa contact map, which shows the superset of all possible bonds between monomers calculated by static analysis 
(Danos etal, 2008). The contact map was computed using Kappa's complx tool accessed through the PySB Kappa wrapper library. Rectangles represent monomers, 
circles represent sites, and lines represent bonds. 



libraries encoding biochemical actions such as 'catalyze,' 
'bind,' 'assemble,' etc. The advantages of a high-level abstrac- 
tion layer include greater transparency and intelligibility, a 
reduction in implementation errors and a dramatic increase in 
the ability to compare and reuse previous work. Each level of 
representation remains accessible for analysis and there are no 
explicitly black-box steps: the Python model code reveals 
model ontogeny, structure, and approach to mechanism; the 
BNGL or Kappa rules generated by PySB support agent-based 
simulation and static analysis; and equations enable numer- 
ical integration, sensitivity analysis, steady-state analysis, etc. 
Use of familiar but powerful programming concepts in PySB 
models such as abstraction, composition, modularity, inheri- 
tance, and polymorphism make it possible to create variant 
models from pre-existing models across several axes of 
variation and build new models from previously tested 
elements. We expect these features of PySB to facilitate 
collaborative model development and evaluation. 

PySB draws on well-established practices in the open-source 
programming community for model documentation and 
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sharing. Because PySB models are programs, they can be 
tracked and shared using the powerful tools developed for 
distributed, open-source software development (e.g., all the 
models in this paper are available, with documentation, at 
GitHub; see Materials and methods). It is simple to update 
models online, highlight differences with previous work and 
divide development among multiple individuals and research 
teams. Finally, PySB can be used as a general-purpose 
modeling tool because it interoperates with diverse scientific 
applications written in Python (e.g., NumPy, SciPy, SymPy, 
and Matplotlib) . Unlike conventional all-in-one programs, 
PySB itself tackles only certain steps in the modeling process, 
relying on interoperability with programs developed and 
maintained by others to create a full-fledged solution. A 
benefit of this approach is that improvements in any of these 
programs accrue directly to users of PySB. 

The power of PySB derives, in part, from its ability to encode 
recurrent biochemical patterns in reusable macros (Figure 2) 
and to divide complex networks into modules that are defined 
once and called when needed (Figures 3B and 4C). By 
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experimental data 

— simulation with nominal parameter values 

— simulation with fitted parameter values 

Figure 6 Simulated annealing fits for EARM 2.0-M1 a for three experimentally 
measured protein signals IC-RP/tBid (A), IMS-RP/Smac release (B), and EC-RP/ 
PARP cleavage (C). Gray lines indicate the experimental data with error bars 
indicating the s.d. In the case of B, the gray line denotes the mean time of death 
7S xp , used to align the trajectories (Materials and methods). The purple curves 
show the simulated trajectories using nominal parameter values; the orange 
curves show the simulated trajectories after model fitting. The objective function 
for fitting is described in the Materials and methods. 



eliminating re-implementation, macros and modules separate 
fundamental mechanistic concepts from implementation 
details, and thereby make clear the purpose and origins of 
specific model features (Mallavarapu et al 2008; Pedersen and 
Plotkin, 2008; Mirschel et al 2009; Gnad et al 2012). The 
ability of real biological networks to be meaningfully decom- 
posed into functional modules is highly context dependent and 
a matter of controversy (Del Vecchio et al 2008), but there is 
no requirement that modules in PySB correspond to modules 
in a biological or 'black-box' engineering sense: the full 
reaction network is always accessible without simplification. 
Instead, PySB modules are defined according to flexible and 
convenient organizational boundaries, keeping open the 
possibility for cross-talk and emergent interactions with other 
modules. This style of modularity follows the open-ended 
approach of little b (Mallavarapu et al 2008) and differs from 
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ProMoT, in which modules interact only through previously 
designated molecular species (Mirschel et al 2009). In general, 
choosing the right boundaries for a module, whether a 
software program or a biological model, is a matter of art 
and practical experience. In the models of extrinsic apoptosis 
analyzed in this paper, reactions governing MOMP are a good 
candidate for modularization because they largely take place 
in a discrete compartment (the mitochondrial membrane) and 
focus on reactions among Bcl-2 proteins. 

We have found that PySB naturally supports a hierarchy of 
modeling concepts (Figure 7 A) . At the top of this hierarchy are 
the models themselves, which represent a specific hypothesis 
about the topology and activity of a biological system or 
network; at the bottom of the hierarchy are specific mathema- 
tical equations (e.g., ODEs). Typical approaches to modeling 
proceed by directly rendering the hypothesis in equations, 
making it difficult to discern the assumptions implicit in the 
process of mathematical translation (Figure 7B). Rule-based 
approaches represent an intermediate level of abstraction in that 
they enumerate local interactions between proteins in a way that 
is less explicit than equations (Figure 7C). PySB adds an 
additional layer of abstraction in that the user works with macros 
and functions (Figure 7A; see also Figure 2) . Sets of macros are 
then grouped into reusable subroutines that implement small 
mechanistic 'motifs' corresponding roughly to a sentence in a 
word model, such as £ tBid activates Bax and Bak.' Such 'motifs' 
are then composed into modules, and modules into models. 
Constructed in this fashion, a set of variant models forms a £ web' 
of intertwined elements that is largely self -documenting. 



PySB as a second-generation approach 

PySB is not the first attempt to create a high-level language for 
modeling biochemistry and was inspired by ProMoTand little 
b, both of which represent models as LISP programs 
(Mallavarapu et al 2008; Mirschel et al 2009). However, as 
described above, these tools had limited or no support for rule- 
based modeling. PySB is based on the much more familiar 
Python language and is interoperable with BNGL and Kappa. 

The rule-based modeling community is also developing 
tools for managing complex models. For example, MetaKappa 
targets redundancy in models having related molecular 
species and partially overlapping functional characteristics 
(e.g., a set of mutants or isoforms of a single protein) (Danos 
et al 2009). The Language for Biological Systems (LBS) is 
another approach in which rules are combined with methods 
for constructing parameterized modules (Pedersen and 
Plotkin, 2008). MetaKappa and LBS are examples of 
domain-specific languages (DSL) for high-level biological 
modeling, whereas PySB supports high-level modeling 
through the structured programming features of Python 
(Box 2). Through Python, PySB provides substantial flex- 
ibility in organizing models. A potential drawback of this 
flexibility is that static analysis of models may be more 
challenging to implement than for DSLs such as MetaKappa or 
LBS. For the time being, available static analysis algorithms 
can be applied to the rules generated from PySB models 
(Danos et al 2008; Feret et al 2009) . We expect the strengths 
and weaknesses of these different approaches to become 
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Models 

Mechanistic hypotheses 

Modules 

Self-contained pathways in a model; 
paragraphs in a word model 

Motifs 

Small mechanistic elements; 
sentences in a word model 
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Common biochemical processes; 
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Figure 7 A conceptual hierarchy for model building with PySB. (A) Simplified hierarchy for EARM 2.0-M2a and M3a. Models can be decomposed into a small number 
of separable modules; modules can in turn be decomposed into smaller recurrent mechanistic elements or motifs. Motifs are collections of macro calls (e.g., the motif 
'tBid activates effectors' consists of two catalyze calls; Figure 5A). Each macro expands into multiple rules (Figure 2) and rules generate equations (Figure 1 B). The 
gray box highlights the intermediate abstraction layers afforded by PySB. (B) ODE-based approaches to model construction proceed directly from a complex biological 
system to its mathematical representation, whereas (C) rule-based approaches provide one abstraction layer above ODEs. 



clearer as high-level languages are more widely adopted by 
the modeling community. 

Graphical tools represent an alternative approach to making 
biological models easier to understand. CellDesigner (Kitano 
et al, 2005) and BioUML (Kutumova et al, 2012) employ visual 
interfaces and graphical languages for model creation, inter- 
pretation, and revision. Our experience has been that purely 
visual approaches to model creation do not scale well with 
model size and are ambiguous with respect to the underlying 
biochemical mechanisms. This parallels experience in the 
software engineering community with UML diagrams: dia- 
grams are a helpful adjunct to programmatic abstractions but 
they are not a replacement. This implies that visualizations 
should be created from an underlying computable representa- 
tion and not the other way around; this is the approach taken 
by the rxncon software, which uses specialized tables to 
describe reactions in a way that specifically supports multiple 
visualizations (Tiger et al, 2012). However, to be most useful, 
the computable representation must be made as transparent as 
possible: approaches that rely on generated visualizations for 
transparency are hard to modify, because changes must be 
made to the underlying (non-transparent) representation. 
Although visualization remains important for communicating 
and describing models (Chylek et al, 2011), PySB follows the 
common software engineering paradigm in which program- 
matic abstraction serves as the principal tool for managing 
complexity and visualization serves to illustrate specific 
properties of a system (Danos et al, 2008). By way of 
illustration, we show how a reaction network (Figure 5C) 
and a Kappa contact map (Figure 5D) can be generated from 
EARM 2.0-Mlb (the MOMP-module-only variant of Mia) and 
a species graph can be generated for the full EARM 2.0-Mla 
model (Supplementary Figure S2). These visualizations 
proved useful in debugging models during the preparation of 
this manuscript, but it was the PySB model code that 
supported straightforward revisions (Figure 5A). The code 
not only contains more information than the visualizations, 
but with only 10 macro calls falling into 5 'motifs' (color-coded 
regions), it compares favorably with even a simplified, 
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hand-drawn 'cartoon' representation (Figure 5B) in terms of 
intelligibility. 

The advantages of using programmatic methods for modeling 
biological pathways are not necessarily evident from simple 
examples in which the underlying equations are self-explanatory 
(e.g., the 'Hello World' model in Figure 1 A) . This has led some to 
dispute the value of such methods and to argue that direct 
modeling in equations is superior. However, as models become 
more realistic and complex, equations rapidly become difficult 
to understand and errors creep in, a problem that was evident 
with the published MOMP models we reinstantiated (Figure 4) . 
The advantages of programmatic abstraction also become more 
evident when a model must be revised or shared, particularly if 
the original developers have moved on to something new. This 
paper shows how PySB, in combination with BNGL and Kappa, 
goes a long way toward addressing this problem. 



The EARM 2.0 models of extrinsic cell death 

One aim of this paper was to create a new model of extrinsic 
apoptosis that incorporated the latest thinking on the 
biochemistry of MOMP while facilitating comparison with 
previously published models. EARM 2.0 includes 15 different 
models for the reactions controlling MOMP, 12 of which were 
previously published in five papers and three of which are 
novel. In general, previously published models of MOMP do 
not explore Bcl-2 biochemistry in the context of a complete 
receptor-to-effector caspase network. Such models also 
simplify the biology of Bcl-2 proteins, representing only a 
subset of the family members. We overcame these limitations 
by modularizing the extrinsic apoptosis pathway and using 
composition to embed different MOMP modules within a 
larger network. Use of the bind_table macro (for modeling 
interactions among members of a multi-protein family) made 
it possible to efficiently encode the differential binding 
affinities of many Bcl-2 proteins for each other. Although the 
molecular interactions included in our EARM 2.0 models are 
not comprehensive, the extensible nature of the PySB 
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representation makes it simple to add additional mechanistic 
details and species in future work. 

One important feature of apoptosis is that MOMP pores do 
not form until a pro-death stimulus has been applied at a 
sufficient concentration for several hours; however, once pores 
form cells die quickly (Goldstein et al, 2000; Rehm et al, 2002; 
Albeck et al, 2008b) . Our original explanation for the all-or- 
none regulation of MOMP was that an anti-apoptotic protein 
such as Bcl-2 or Bcl-XL binds to successively larger Bax and 
Bak oligomers, thereby creating a cooperative inverse relation- 
ship between Bcl-2 and pore levels. This idea has not been 
borne out by experiments and instead it appears that the 
kinetic properties of MOMP must arise from the dual affinity of 
anti-apoptotic proteins for both BH3-only proteins and pore- 
forming effectors (Billen et al, 2008; Llambi et al, 2011). We 
find that an 'embedded together' model (EARM 2.0-Mla) 
incorporating this revised thinking about Bcl-2 family proteins 
(Leber et al, 2010) can reproduce the dynamics of MOMP as 
measured in single cells using live-cell imaging (Spencer et al, 
2009). An additional round of experimental and computa- 
tional 'model discrimination' studies is needed to show 
whether EARM 2.0-Mla is indeed superior to previous models, 
but we prefer it simply on the basis of its faithful recapitulation 
of current knowledge. 



PySB as a means to incremental and collaborative 
model development 

One of the key aims for PySB, and also for BNGL, Kappa and 
related meta-languages, is to promote distributed, incremen- 
tal, and collaborative approaches to modeling. There are both 
technical and conceptual challenges that must be addressed 
for this to be successful. A significant technical hurdle in 
model integration and reuse is the need for a standard 
nomenclature for model species. The SBML community's 
MIRIAM standard is an essential resource in this regard (Le 
Novere et al, 2005), but the way in which rule-based models 
represent species and complexes will demand a modified 
approach to annotation. As a first step, we have implemented a 
basic annotation capability in PySB (see Supplementary Figure 
S3) based on MIRIAM-style subject-object-predicate triples 
that should help resolve naming ambiguities. 

However, the fundamental challenge for integrating and 
reusing models of disparate biological processes remains the 
fact that biological models remain 'fit-to-purpose,' focused on 
addressing specific biological problems or contexts (Krakauer 
et al, 2011). PySB does not prescribe 'universal' approaches to 
representing biological components or processes but instead 
makes fit-to-purpose modeling more transparent and manage- 
able through the use of both hierarchical abstractions 
(Figure 7) and tools for documenting, testing, and tracking 
models drawn from software engineering. In the short-term, 
these features should allow communities of biologists working 
on related biological problems to work in parallel toward 
shared goals; in the longer-term, real- world experimentation 
with approaches for collaborative modeling should yield best 
practices for building broadly reusable models. 

The fact that PySB models are programs allows us to exploit 
the tools and social conventions of the open-source software 
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development community for distributed model development. 
In open-source software, derivative or variant branches of a 
source tree can be spun off and then merged into the principal 
source tree if desired. Version control systems such as Git 
allow this process to be managed and visualized. PySB 
models shared via GitHub can have both private and public 
branches that preserve the integrity of ongoing model 
development while allowing for external contributions. Multi- 
ple groups can develop derivative models with confidence that 
the relationships among variants can be tracked and managed. 
SBML versions of PySB models can also be deposited 
in the BioModels repository, supporting current procedures 
for indexing, citation, and search. Software tests can be written 
to ensure that models and modules behave as documented. As 
models get larger and the scope of the underlying biology 
exceeds the expertise of a single modeling team, tools such as 
PySB will be needed to create reusable, shareable, and 
transparent biological models in a distributed manner — a 
major goal of a systems-level program of biological discovery. 



Materials and methods 
PySB code and documentation 

PySB is freely available under an open-source BSD license. Links to the 
GitHub source code repository as well as documentation and other 
didactic materials are available at http://pysb.org. The EARM 2.0 
models and associated documentation are available along with the 
data used for model calibration at http://sorgerlab.github.com/earm. 
An SBML version of the EARM 2.0-Mla model is included in the 
Supplementary Materials of the paper. 



PySB syntax 

PySB implements a syntax for rules based on that of BNGL using 
classes and overloaded operators from the Python language, lever- 
aging Python as the language parser. Rule definitions are built up using 
patterns (represented internally as instances of the classes Mono- 
merPattern, ComplexPattern, and ReactionPattern) that 
specify both a rule's preconditions and its effects. Site and state 
conditions for monomers are specified using an overloaded 

call method for the Monomer class, which takes site and state 

conditions as keyword arguments. For example, if L is an instance of 
Monomer, L ( s = ' P ' ) specifies L with site s having state P. This use 

of the call method, along with the overloaded operators + , %, 

» , and < > , allow rules to be specified using a syntax that parallels 
that of BNGL and Kappa (Figure 1 A) . See also the 'PySB syntax' section 
of the Supplementary Materials. 



Component self-export 

By default, when a model component is constructed, it is added to the 
current instance of the Model class; a variable referring to the newly 
created component (with a name matching the name passed to the 
component constructor) is also inserted into the global namespace. 
This eliminates the need to retain references to newly created objects 
and explicitly add them to the model. This 'self-export' functionality is 
managed by the PySB core class Self Exporter, which identifies the 
module in which the current Model instance was declared and adds 
global variables for components to that namespace. If alternative 
approaches to component and model management are desired, the 
self-export feature can be disabled by adding the keyword argument 
_export = False to each component constructor. See also the 'PySB 
syntax' section of the Supplementary Materials. 
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ODE integration 

PySB generates the reaction network through an external call to 
BioNetGen and extracts the network graph by parsing the resulting 
. net file. The network graph is then used to build up the right-hand 
sides of the ODEs as SymPy symbolic math expressions containing the 
appropriate rate constants, mass action terms, and stoichiometric 
coefficients. If a C compiler is available, the right-hand side function is 
implemented in C using scipy .weave . inline; otherwise, the 
right-hand side function is evaluated as a Python expression. 
A reference to the right-hand side function is passed to scipy. 
integrate . ode, a generic wrapper library for ODE integration 
routines; for this work, we used the FORTRAN integrator VODE 
(Brown et al, 1989). 



experimental observables as follows: 

Objxotai ( k ) = Obj ICR p ( k ) + Obj IMS RP ( k ) 
+ ObjECRP ( k ) 

where k = k x , k 2 , . . . , k n are the rate parameters for the system of ODEs, 

and ObjicRp(fc), Obji M sRp(fc), and Obj EC Rp(/c) are the objective 
functions for corresponding observables. These objective functions 
were defined as follows: 



Obj ICR p(fc) = l£ 



tBid t ( k ) 



Bido 



" (ICRPt) 



N 



2 • Var(ICRPr) 



Conversion of published models to PySB 

For the models lacking an electronic version (Chen et al, 2007a, b; Cui 
et al, 2008; Howells et al, 2010), the ODEs generated by the PySB 
versions were manually validated against the ODEs listed in the 
original publications. In the case of the three models with errors in the 
published ODEs (Cui et al, 2008) (see Supplementary Note), the PySB 
version was written to generate the ODEs corresponding to the 
described reaction scheme without these errors. For the models for 
which we had access to the original MATLAB code (Albeck et al, 
2008b), the PySB versions were also programmatically validated 
against the output from the published versions. 



Modularization of MOMP models 

Each MOMP-only model (Table II) was written to have the addition of 
tBid as its most upstream event, and the release of cytochrome c and 
Smac as its most downstream event. In some cases, these default 
boundaries did not match the boundaries for the MOMP module in the 
original publications: Albeck et al (2008b) had the addition of active 
caspase-8 as the most upstream event, whereas the Shen/Howells 
group models had Bax oligomerization (rather than Smac release) as 
the most downstream event (Chen et al, 2007a, b; Cui et al, 2008; 
Howells et al, 2010) . In these cases, the networks of the original models 
were modified to achieve consistent boundaries across modules. The 
boundaries of the original models can nevertheless be reproduced 
through the use of optional arguments to the module subroutines that 
add or remove reactions as appropriate. 



Simulation and parameter estimation 

Simulations of EARM 2.0-Mla were carried out using the VODE 
integrator via the SciPy library using Newton's method for root 
evaluations and the backward differentiation formula integration 
method. Absolute and relative tolerances were set to 10 ~ 5 . Parameter 
estimation was performed using the simulated annealing routine 
implemented in scipy . optimize . anneal with an appropriately 
defined objective function (described below). Nominal values for rate 
constants for the DISC and PARP modules were set to their published 
values in EARM 1.0; rate constants for the MOMP module were drawn 
from Certo etal (2006), Willis etal (2005), or set to values from similar 
rates from EARM 1.0. During the annealing process, all rate constants 
were allowed to vary two orders of magnitude above and below their 
nominal values (i.e., 0.01X — 100X); initial protein concentrations were 
held fixed and not estimated. 

Trajectories for the initiator caspase reporter protein (IC-RP), 
mitochondrial inter-membrane space reporter protein (IMS-RP), and 
effector caspase reporter protein (EC-RP) were used from previously 
published data (Spencer et al, 2009). In the model, tBid, cytosolic 
Smac, and cleaved PARP were fit to the data for IC-RP, IMS-RP, and EC- 
RP, respectively. IMS-RP data from 10 cells indicated an average MOMP 
time of 9810 ± 2690 s after the exposure of the cells to ligand. The IC-RP 
and EC-RP signals were normalized and aligned to this MOMP time to 
yield an average trajectory for each. The objective function used to 
calculate model fit was the sum of component functions for each of the 



where N = 112 is the number of experimental time points, tBid t ( k ) is 

the simulated value of tBid at time t (with the given parameters, k ), 
Bid 0 is the total amount of Bid in the simulation, and ICRP f is the 
distribution of MOMP-aligned experimental IC-RP trajectories at time 
t. The objective function for EC-RP was defined similarly, but with 
cleaved PARP in place of tBid and EC-RP in place of IC-RP: 



Obj ECR p(/c) 



-\ 1 



N 



cPARP t ( k ) 

PARPn 



- (ECRPf) 



2 • Var(ECRP t ; 



Data from previous experiments showed that release of IMS-RP from 
mitochondria due to MOMP is both rapid and complete under all tested 
experimental conditions, typically reaching completion within 
1-3 min (Albeck et al, 2008a). To ensure that the model could 
reproduce these kinetics, we used an alternative objective function for 
the cSmac/IMS-RP signal to consider the onset (T D ) and duration (T s ) 
of MOMP: 



Obj n 



p(*) = 



1 



( T^Sim 



< T Ex P) 



T Sim. 



2-Var^ xp ) 



-(?f P )) 2 



2-Var(rf p ) 



/ cSmac N _ -i A 
^Smaco L ) 

2 • 10" 6 



T^ xp is the experimental 'death time,' recorded as the time of MOMP as 
measured by IMS-RP release; Tf xp is the experimental 'switching time,' 
defined as the time required for IMS-RP to be released, estimated to 
have a mean value of 180 s and a variance of 3600 s 2 . The 
corresponding simulated values were defined as 



T--Sim _ 
1 D ~ 



T90 + Tu 



where T 90 and T 10 are the times taken by the simulation to reach 90% 
and 10% of maximum Smac release, respectively. The final term in 

Obj IMSRP (/c) constrains the simulated Smac trajectory to achieve 
approximately 100% release: cSmac^ is the final simulated value of 
cytosolic Smac, and Smac 0 is total Smac. 



Supplementary information 

Supplementary information is available at the Molecular Systems 
Biology website (www.nature.com/msb). 
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